Deepfakes在良好的信仰应用中越来越受欢迎,例如在娱乐和恶意预期的操作,例如在图像和视频伪造中。主要是由后者的动机,最近已经提出了大量的浅频道探测器以识别此类内容。虽然这种探测器的性能仍然需要进一步的改进,但它们通常以简单的话进行评估,如果不是琐碎的情景。特别地,诸如转码,去噪,调整和增强的良性处理操作的影响是不充分研究。本文提出了一种更严格和系统的框架,以评估DeepFake探测器在更现实情况中的性能。它定量测量每个良性处理方法如何以及在艺术最先进的深蓝检测方法的情况下衡量如何和何种程度。通过在流行的DeepFake探测器中说明它,我们的基准测试提出了一种框架来评估探测器的稳健性,并提供有价值的洞察设计更高效的DeeFake探测器。
translated by 谷歌翻译
Increasingly taking place in online spaces, modern political conversations are typically perceived to be unproductively affirming -- siloed in so called ``echo chambers'' of exclusively like-minded discussants. Yet, to date we lack sufficient means to measure viewpoint diversity in conversations. To this end, in this paper, we operationalize two viewpoint metrics proposed for recommender systems and adapt them to the context of social media conversations. This is the first study to apply these two metrics (Representation and Fragmentation) to real world data and to consider the implications for online conversations specifically. We apply these measures to two topics -- daylight savings time (DST), which serves as a control, and the more politically polarized topic of immigration. We find that the diversity scores for both Fragmentation and Representation are lower for immigration than for DST. Further, we find that while pro-immigrant views receive consistent pushback on the platform, anti-immigrant views largely operate within echo chambers. We observe less severe yet similar patterns for DST. Taken together, Representation and Fragmentation paint a meaningful and important new picture of viewpoint diversity.
translated by 谷歌翻译
Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces purely from scalar reward signals. A crucial challenge for current deep RL algorithms is that they require a tremendous amount of environment interactions for learning. This can be infeasible in situations where such interactions are expensive; such as in robotics. Offline RL algorithms try to address this issue by bootstrapping the learning process from existing logged data without needing to interact with the environment from the very beginning. While online RL algorithms are typically evaluated as a function of the number of environment interactions, there exists no single established protocol for evaluating offline RL methods.In this paper, we propose a sequential approach to evaluate offline RL algorithms as a function of the training set size and thus by their data efficiency. Sequential evaluation provides valuable insights into the data efficiency of the learning process and the robustness of algorithms to distribution changes in the dataset while also harmonizing the visualization of the offline and online learning phases. Our approach is generally applicable and easy to implement. We compare several existing offline RL algorithms using this approach and present insights from a variety of tasks and offline datasets.
translated by 谷歌翻译
Spurious correlations, or correlations that change across domains where a model can be deployed, present significant challenges to real-world applications of machine learning models. However, such correlations are not always "spurious"; often, they provide valuable prior information for a prediction beyond what can be extracted from the input alone. Here, we present a test-time adaptation method that exploits the spurious correlation phenomenon, in contrast to recent approaches that attempt to eliminate spurious correlations through invariance. We consider situations where the prior distribution $p(y, z)$, which models the marginal dependence between the class label $y$ and the nuisance factors $z$, may change across domains, but the generative model for features $p(\mathbf{x}|y, z)$ is constant. We note that this is an expanded version of the label shift assumption, where the labels now also include the nuisance factors $z$. Based on this observation, we train a classifier to predict $p(y, z|\mathbf{x})$ on the source distribution, and implement a test-time label shift correction that adapts to changes in the marginal distribution $p(y, z)$ using unlabeled samples from the target domain. We call our method "Test-Time Label-Shift Adaptation" or TTLSA. We apply our method to two different image datasets -- the CheXpert chest X-ray dataset and the colored MNIST dataset -- and show that it gives better downstream results than methods that try to train classifiers which are invariant to the changes in prior distribution. Code reproducing experiments is available at https://github.com/nalzok/test-time-label-shift .
translated by 谷歌翻译
Humans have perfected the art of learning from multiple modalities through sensory organs. Despite their impressive predictive performance on a single modality, neural networks cannot reach human level accuracy with respect to multiple modalities. This is a particularly challenging task due to variations in the structure of respective modalities. Conditional Batch Normalization (CBN) is a popular method that was proposed to learn contextual features to aid deep learning tasks. This technique uses auxiliary data to improve representational power by learning affine transformations for convolutional neural networks. Despite the boost in performance observed by using CBN layers, our work reveals that the visual features learned by introducing auxiliary data via CBN deteriorates. We perform comprehensive experiments to evaluate the brittleness of CBN networks to various datasets, suggesting that learning from visual features alone could often be superior for generalization. We evaluate CBN models on natural images for bird classification and histology images for cancer type classification. We observe that the CBN network learns close to no visual features on the bird classification dataset and partial visual features on the histology dataset. Our extensive experiments reveal that CBN may encourage shortcut learning between the auxiliary data and labels.
translated by 谷歌翻译
Event Extraction (EE) is one of the fundamental tasks in Information Extraction (IE) that aims to recognize event mentions and their arguments (i.e., participants) from text. Due to its importance, extensive methods and resources have been developed for Event Extraction. However, one limitation of current research for EE involves the under-exploration for non-English languages in which the lack of high-quality multilingual EE datasets for model training and evaluation has been the main hindrance. To address this limitation, we propose a novel Multilingual Event Extraction dataset (MEE) that provides annotation for more than 50K event mentions in 8 typologically different languages. MEE comprehensively annotates data for entity mentions, event triggers and event arguments. We conduct extensive experiments on the proposed dataset to reveal challenges and opportunities for multilingual EE.
translated by 谷歌翻译
Current pre-trained language models rely on large datasets for achieving state-of-the-art performance. However, past research has shown that not all examples in a dataset are equally important during training. In fact, it is sometimes possible to prune a considerable fraction of the training set while maintaining the test performance. Established on standard vision benchmarks, two gradient-based scoring metrics for finding important examples are GraNd and its estimated version, EL2N. In this work, we employ these two metrics for the first time in NLP. We demonstrate that these metrics need to be computed after at least one epoch of fine-tuning and they are not reliable in early steps. Furthermore, we show that by pruning a small portion of the examples with the highest GraNd/EL2N scores, we can not only preserve the test accuracy, but also surpass it. This paper details adjustments and implementation choices which enable GraNd and EL2N to be applied to NLP.
translated by 谷歌翻译
强化学习(RL)的成功在很大程度上取决于从环境观察中学习强大表示的能力。在大多数情况下,根据价值功能的变化,在各州之间纯粹通过强化学习损失所学的表示形式可能会有很大差异。但是,所学的表示形式不必非常具体地针对手头的任务。仅依靠RL目标可能会产生在连续的时间步骤中变化很大的表示形式。此外,由于RL损失的目标变化,因此所学的表示将取决于当前价值/策略的良好。因此,从主要任务中解开表示形式将使他们更多地专注于捕获可以改善概括的过渡动态。为此,我们提出了局部约束的表示,辅助损失迫使国家表示由邻近状态的表示可以预测。这不仅鼓励表示形式受到价值/政策学习的驱动,还可以自我监督的学习来驱动,这会限制表示表示的变化太快。我们在几个已知的基准上评估了所提出的方法,并观察到强劲的性能。尤其是在连续控制任务中,我们的实验比强基线显示出显着的优势。
translated by 谷歌翻译
近年来,由于深度学习体系结构的有希望的进步,面部识别系统取得了非凡的成功。但是,当将配置图像与额叶图像的画廊匹配时,它们仍然无法实现预期的准确性。当前方法要么执行姿势归一化(即额叶化)或脱离姿势信息以进行面部识别。相反,我们提出了一种新方法,通过注意机制将姿势用作辅助信息。在本文中,我们假设使用注意机制姿势参加的信息可以指导剖面面上的上下文和独特的特征提取,从而进一步使嵌入式域中的更好表示形式学习。为了实现这一目标,首先,我们设计了一个统一的耦合曲线到额定面部识别网络。它通过特定于类的对比损失来学习从面孔到紧凑的嵌入子空间的映射。其次,我们开发了一个新颖的姿势注意力块(PAB),以专门指导从剖面面上提取姿势 - 不合稳定的特征。更具体地说,PAB旨在显式地帮助网络沿着频道和空间维度沿着频道和空间维度的重要特征,同时学习嵌入式子空间中的歧视性但构成不变的特征。为了验证我们提出的方法的有效性,我们对包括多PIE,CFP,IJBC在内的受控和野生基准进行实验,并在艺术状态下表现出优势。
translated by 谷歌翻译
在本文中,我们试图在抽象嵌入空间中绘制额叶和轮廓面图像之间的连接。我们使用耦合编码器网络利用此连接将额叶/配置文件的面部图像投影到一个常见的潜在嵌入空间中。提出的模型通过最大化面部两种视图之间的相互信息来迫使嵌入空间中表示的相似性。拟议的耦合编码器从三个贡献中受益于与极端姿势差异的匹配面。首先,我们利用我们的姿势意识到的对比学习来最大程度地提高身份额叶和概况表示之间的相互信息。其次,由在过去的迭代中积累的潜在表示组成的内存缓冲区已集成到模型中,因此它可以比小批量大小相对较多的实例。第三,一种新颖的姿势感知的对抗结构域适应方法迫使模型学习从轮廓到额叶表示的不对称映射。在我们的框架中,耦合编码器学会了扩大真实面孔和冒名顶替面部分布之间的边距,这导致了相同身份的不同观点之间的高度相互信息。通过对四个基准数据集的广泛实验,评估和消融研究来研究拟议模型的有效性,并与引人入胜的最新算法进行比较。
translated by 谷歌翻译